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Summary 


This review investigates two recent developments in artificial intelligence and neural 
computation: learning from imitation and the development of humanoid robots. It will 
be postulated that the study of imitation learning offers a promising route to gain new 
insights into mechanisms of perceptual motor control that could ultimately lead to the 
creation of autonomous humanoid robots. Imitation learning focuses on three important 
issues: efficient motor learning, the connection between action and perception, and 
modular motor control in form of movement primitives. It will be reviewed how re- 
search on representations of, and functional connections between action and perception 
have contributed to our understanding of motor acts of other beings. The recent discov- 
ery that some areas in the primate brain are active during both movement perception 
and execution has provided a hypothetical neural basis of imitation. Computational ap- 
proaches to imitation learning will also be described, initially from the perspective of 
traditional AI and robotics, but also from the perspective of neural network models and 
statistical learning research. Parallels and differences between biological and computa- 
tional approaches to imitation will be highlighted and an overview of current projects 
that actually employ imitation learning for humanoid robots will be given. 


In 1921 Karel Capek’s play Rossum’s Universal Robots (1) provided the first concrete vi- 
sion of how a robot should look: it should look like a human being. Since this time, sci- 
ence fiction stories have created a never ending stream of increasingly sophisticated 
superhuman machines, but research has not been able to realize a robot that comes 
even close to Capek’s relatively “simple” artificial humans. What makes it so hard to 
create a human-like robot? From an engineering point of view, an argument could be 
made that appropriate materials, motors, power supplies, and sensors are missing to 
achieve the compact, compliant, and lightweight design of biology. However, even if 
we had access to a robotic system that incorporated all these desirable properties, and 
even if this machine were equipped with a supercomputer, we would still not succeed 
in creating a humanoid. The problem is that the algorithms required to program this 
machine such that it achieves the versatility and flexibility of biological systems are not 
yet available. For the time being, robots can only solve tasks after the task has been 
carefully analyzed and added to the robot program by a human. An impressive exam- 
ple of such a procedure was recently provided by the research laboratories of the 
Honda Corporation in Japan (2; 3). In about 10 years of work, Honda created the first 
humanoid robot that can walk, climb stairs, and manipulate simple objects (Figure 1). 


However, the amount of work it took to 
build the Honda robot, and the fact that, 
besides locomotion, the machine requires 
teleoperation to perform other tasks, is 
far from satisfying. 


In order to overcome the need for 
teleoperation and manual “hard-coding” 
of every behavior, a learning approach is 
required. The most general approach to 
learning control is reinforcement learn- 
ing (4, 5). Reinforcement learning usually 
requires an unambiguous representation 
of states and actions of the movement 
system and the existence of a scalar re- 
ward function. Learning proceeds by 
Figure 1: Honda Humanoid Robot in a) frontal, and trying actions in a particular state, and, 
b) side view. based on the received, possibly delayed 

(6) reward, updating an evaluation func- 

tion that assigns expected rewards to 
possible actions. After learning, the action with the highest expected value in each state 
is chosen to achieve the task goal. 


An important element of learning control is the requirement to explore new actions 
in order to find good, or even optimal, solutions to a given movement task. For a 
movement system with many degrees of freedom, there is an exponential explosion in 
the number of actions that can be taken in every state. For example, the Honda robot in 
Figure 1 has 30 degrees-of-freedom (DOFs), each of which needs a motor command at 
every instant in time. Even if the command for each DOF has only three possible val- 
ues, e.g., forward, backward, and zero, there is a combination of 3%>10" different ac- 
tions that can be taken in every state. As it is impossible to search such huge spaces for 
what constitutes a good action, it is necessary to either find more compact state-action 
representations, or to focus learning on those parts of the state-action space that are 
actually relevant for the movement task at hand. In the following article, we will re- 
view how the latter topic can be approached in the framework of imitation learning, 
while the former topic, i.e., compact state-action representations, will be shown to be a 
natural prerequisite for imitation learning in the form of movement primitives 


Movement Imitation 


Movement imitation is familiar to everybody from daily experience: a friend demon- 
strates a movement, and immediately we are capable of approximately repeating it. 
For the purpose of this review, only visually mediated imitation will be considered, 
although, at least in humans, verbal communication can supply important additional 


information. From the viewpoint of motor learning, a teacher’s demonstration as the 
starting point of one’s own learning can significantly speed up the learning process: 
imitation drastically reduces the size of the state-action space that needs to be explored 
(7). With an eye towards computational approaches, we will first review some of the 
most relevant milestones in the study of imitation learning before we will look into the 
promises and challenges that imitation learning poses to computational modelers. 


Imitation from the Viewpoint of Behavioral and Cognitive Sciences 


In infant and animal studies, the ability to imitate is usually concluded from the sub- 
ject’s increased tendency to execute a previously demonstrated behavior. However, 
other causes can equally account for a higher probability of the subject’s behavior, in 
particular priming, emulation, and response facilitation (Glossary); such causes are not 
to be mistaken with true imitation (8, 9). True imitation is present only if i) the imitated 
behavior is new for the imitator, ii) the same task strategy as that of the demonstrator is 
employed, and iii) the same task goal is accomplished (9). For example, if a movement 
is not new, response facilitation rather than imitation can account for the imitator’s be- 
havior. However, as will be shown below, from a computational point of view, imita- 
tive behavior in the form of response facilitation already contains very complex prob- 
lems. Thus, a formal understanding of the mechanisms of response facilitation would 
constitute significant scientific progress. 


Behavioral and cognitive sciences have long been interested in imitation of move- 
ments, from Darwin (10) and Thorndike (11) to Piaget (12). After Piaget’s work, 
movement imitation did not receive widespread attention anymore, partially due to 
the prejudice that “imitating” or “mimicking” is not an expression of higher intelli- 
gence. This attitude changed in the 1970’s, to some extent due to studies of Meltzoff 
and Moore (13). These authors reported on the ability of 12-21 day old and, later, even 
less than an hour old infants (14) to imitate both facial and manual gestures. Young in- 
fants of this age had neither seen their own faces nor been exposed to viewing faces of 
other humans for any significant amount of time. Thus, the ability to map a perceived 
facial gesture to their own gestures was concluded to be innate and contradicted 
Piaget’s ontogenetic account of imitation (12). In the light of this new interest in imita- 
tion learning, it was discovered that many animals are unable to learn by imitation (9, 
15). These findings contributed to today’s view of imitation as an important expression 
of higher intelligence. In recent years, a significant amount of new work has been pub- 
lished on imitation learning in humans and animals. Examples include comparative 
studies between human and monkey imitation (16), the interplay between memory and 
imitation and its implications for learning (17, 18), and the focus of attention while ob- 
serving a demonstration (19). Imitation is also closely connected to research that inves- 
tigates the connection between action and perception (20, 21, 22). 


Glossary 
Accommodation: The process by which an activated internal perceptual motor representation is adapted to better 
suit a new task. 
Action Level Imitation: The indiscriminate copying of the actions of the teacher without mapping them onto more 
abstract motor representation. 
Action: All variables (discrete or continuous) that can actively change the state of a system. Usually, actions are 
motor commands, abbreviated as a vector u. 
Assimilation: The mapping of an observed behavior onto an existing perceptual motor representation. 
Control Policy: A function that maps the state x of a movement system and its environment into an appropriate 
action u for a particular task, i.e., u =p &,t,a). As indicated, the function 7 can directly depend on the time, t, 


and some additional parameters & that may be useful to adjust the policy for a particular task goal. Movement 
Primitives can be formalized in the form of control policies. 

Deferred Imitation: Imitation that takes place a certain amount of time after the demonstration was given by the 
teacher. 

Emulation: Goals in the environment become overt due to the actions of others. Afterwards, the observer strives 
to attain the same goal. 

Forward Model: A mathematical model that predicts the time evolution of a dynamical system. For example, dif- 
ferential equations of motion in the form of x = f(x,u) are prototypical forward models in motor control. 
Immediate Imitation: Imitation that takes place immediately after the demonstration of the teacher. 

Movement Primitive: A sequence of actions that can accomplish a certain movement goal. See Control Policy for a 
more formal definition. 

Object-Centered Representations: The representation is insensitive to the relative position and orientation of the 
perceived object to one’s own body. 

Priming: Stimuli in the environment that co-occur with the actions of the teacher increase the observer's activa- 
tion of corresponding internal representations in memory. Consequently, the observer’s exploratory behavior 
will be biased by these activations towards receiving similar stimuli. 

Program Level Imitation: A process by which the structural organization of a behavior is copied from observing a 
teacher, while the exact details of actions are filled in by individual learning. 

Response Facilitation: Observed actions enhance corresponding action representations in memory. Afterwards, it 
is more likely that the observer performs the observed action. 

Reward Function: A function that provides a scalar (discrete or continuous) value about the goodness of an ac- 
tion u in a state x. 

Simultaneous Imitation: Imitation that takes place concurrently with the demonstration of the teacher. 

Spline Approximations: A mathematical concept of curve fitting, usually with low-order polynomials. Complex 
curves need a sequence of concatenated splines for a good approximation. The points where a spline ends and a 
new one begins are called spline nodes. 

State: All variables (discrete or continuous) that are necessary to model a system. In this article, all states of a 
system are compactly denoted as a vector x. 

State-Action Space: The mathematical space spanned by actions u and states x jointly. Solving a movement task 
can be thought of as finding a path between two points in this state, the initial state and the goal state. 

Stimulus Enhancement: An object or place becomes more salient due to the actions of the teacher in its vicinity or 
in conjunction with the object. This enhancement will draw the observer’s attention or elicit responses towards 
this object or place. 

Supervised Learning: Learning of an input to output mapping under the premise that an explicit error signal can 
be provided for each output. For instance, a Control Policy is a function that can be learned by supervised 
learning if both the state x and the appropriate target action u are given to the learning system. 

Task Level Learning: Learning of a task can take place by learning an appropriate Control Policy that generates 
commands u on the actuator level, or by learning a Control Policy that generates commands in a more abstract 
but task related space, e.g., the space of the finger tip. The latter approach is called task-level learning and it re- 
quires additional transformations to map the task-level command into actuator space. Usually, errors in per- 
formance are more associated with task commands than actuator commands. 

Viewer-Specific Representations: The representation is sensitive to the relative position and orientation of the per- 
ceived object to one’s own body. 


Behavioral studies can provide several sources of ideas for computational motor 
control and imitation learning. For instance, the principle of emulation is interesting for 
learning how to direct the focus of attention towards favorable goals, and priming can 
be used to bias explorative behavior towards useful stimuli. Thus, both principles help 
reducing the number of inputs to a motor system by focusing on important stimuli, but 
they do not reduce the combinatorial explosion of possible actions that can be explored 
in a particular state. Alternatively, Meltzoff and Moore’s (23) postulate of imitation as a 
bootstrapping for communication may offer an interesting starting point for computa- 
tional modelers. The authors argued that given a partially familiar stimulus in the en- 
vironment (e.g., a face), imitating what this “stimulus” did during a previous encoun- 
ter should trigger either a familiar or an unfamiliar response in how the stimulus will 
change. Such a strategy would help to disambiguate stimuli by acting upon them. In- 
terestingly, we will encounter a similar interpretation of the purpose of imitation from 
a neurophysiological perspective, and a related computational mechanism for move- 
ment recognition based on predictive forward models in the section on neural network 
models to imitation. A more concrete concept of imitation was suggested in Meltzoff 
and Moore (24). In their ‘Active Intermodal Mapping’ (AIM) model, they suggest that 
visual perception of the teacher’s movement is converted into a higher level represen- 
tation that can be matched against appropriately transformed proprioceptive informa- 
tion about one’s own movement. If this matching space is given, imitation can be seen 
as learning to achieve a target representation, a problem that can be tackled with tech- 
niques from supervised learning (25). This idea can be rephrased in computational 
terms, as will be discussed below. 


Imitation from the Viewpoint of Neuroscience and Cognitive Neuroscience 


An essential prerequisite for imitation is a connection between the sensory systems and 
the motor systems such that percepts can be mapped onto appropriate actions. This 
mapping is a difficult computational process as visual perception takes place in a dif- 
ferent coordinate frame to motor control. This process is also more complex than pure 
object recognition since it requires integration of multiple objects (i.e., several limbs), 
their spatial relations, their relative and absolute movements, and even the intention of 
these movements. Given the current knowledge about neuroanatomy in primates, such 
a process is likely to happen in various steps, involving both the ventral (what) and 
dorsal (where, how) pathways (26) (Figure 2). From a neurophysiological point of 
view, a first question that can be addressed is whether there are any particular brain 
areas and representations that are specialized to facilitate imitation. 


Perrett and co-workers (27, 28, 29) reported that neurons within the superior tempo- 
ral sulcus (STs) of macaques (Figure 2) respond to both form and motion of objects. 
Interestingly, many cells were sensitive to movements of specific body parts of an ob- 
served human. For instance, cell specificity was found for faces with translatory mo- 
tion in a particular direction, faces that rotated, movement of particular body parts 
(head, arm, leg, hip) and combinations of body parts, and even movement of the entire 
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Figure 2: Sketch of a monkey brain and some areas that are hypothesized to 
be involved in imitation. Abbreviations denote: Ps: principle sulcus, Als: infe- 
rior arcuate sulcus, ASs: superior arcuate sulcus, STs: superior temporal sul- 
cus, Cs: central sulcus, Ls: lateral sulcus, IPs: intraparietal sulcus, MIP: medial 
intraparietal area, VIP: ventral intraparietal area, LIP: lateral intraparietal 
area, AIP: anterior intraparietal area, SII: secondary somatosensory cortex 
(adapted from Rizzolatti et al. (32) and Perrett et al. (30)). Gray areas indicate 
an opened sulcus. Blue arrows indicate known neuronal projections between 
different areas of the brain, dashed arrows indicate hypothesized connections. 


body. In the lower bank of the STs, similar phenomena were found for actions of the 
hand (28). Two important characteristics of this distributed representation are worth 
highlighting. First, most cells were viewer specific, i.e., there existed no indication of an 
object centered representation. From this property, Perrett et al. (30) concluded that STs 
is well suited to extract the attention and goals of others. Second, due to a connection 
of somatosensory cortex to STs (Figure 2), most of the form and motion neurons were 
insensitive to self-motion due to re-afferent signals (31). Thus, STs neurons are in the 
ideal situation to analyze the movement of others without interference from one’s own 
body and seem to be an excellent candidate for a first processing step for imitation. 


What is the next processing step in a hypothesized neural pathway for imitation? 
Area F5 in monkeys could play a crucial role (Figure 2). Rizzolatti et al. (33) found neu- 
rons in area F5 that were specific to the execution of goal related movements, e.g., 
‘reaching’, ‘bringing-to-the-body’, ‘grasping-with-the-hand’, as well as selective to a 
particular type of grasp, for instance, precision grips, finger prehension, or whole- 
hand-prehension. Interestingly, many neurons in F5 were active during the entire or at 
least extended parts of a motor act, instead of just a single submovement or muscle ac- 
tivation. Jeannerod et al. (34) and Murata et al. (35) interpreted this firing pattern as 


coding complete segments of motor acts, or motor schemas (36). The possible connec- 
tion to imitation, however, came with the finding that some of the neurons in F5 , 
called “mirror neurons”, were active both when the monkey observed a specific behavior 
and when it executed it itself (37). Mirror neurons fire highly specifically only to a spe- 
cial motor behavior with a particular object. These results are similar to those in STs 
(28), with the difference that neurons in STs do not respond to executed motor acts, but 
rather only to perceived ones. From imaging and transcranial magnetic stimulation 
studies, there is also some evidence that a similar mirror system exists in humans (38, 
39, 40). Surprisingly, this system seems to involve Broca’s area (41), a brain region 
normally only associated with speech production. The possible homology of F5 in 
monkey and Broca’s area in humans led some authors (41) to speculate that the ability 
to imitate actions and to understand them could have subserved the development of 
communication skills. This idea is consistent with Meltzhoff and Moore’s (23) devel- 
opmental work and interpretations. 


Whether mirror neurons are really part of the imitation system, however, has re- 
mained speculative so far. Neither did the neurophysiological experiments on mirror 
neurons require true imitation behavior nor did they test a movement repertoire that 
monkeys are known to imitate. Gallese and Goldman (42) rather suggest that mirror 
neurons participate in “mind reading”, a process accomplished by using one’s own 
mental apparatus to predict the psychological state of others through mental simula- 
tions. In computational approaches, mental simulations are based on predictive for- 
ward models (43). As will be discussed in the next section, forward models are likely to 
be an important component of an imitation learning system. From this viewpoint, 
simulation theory and imitation learning may share some computational mechanisms. 


The connection between neurons in STs and F5, however, remains unclear at pre- 
sent. Area F5 does receive input from SII, 7b, and AIP (Figure 2), and sends projections 
to F1, i.e., primary motor cortex, the spinal cord, and also back to AIP (35). AIP neu- 
rons were shown to be either sensitive to object specific properties (shape, size, orien- 
tation) or to movements and grasps executed by the monkey, or to both visual and 
motor stimuli (44). Due to the additional observation that many AIP neurons were 
modulated during ongoing grasping movements, Jeannerod et al. (34) hypothesized 
that AIP may provide the necessary spatial object information to F5 for continuous 
guidance of grasping movement. Since AIP receives input from other parietal areas, 
the connection to visual information through the dorsal stream seems to be complete. 
However, this visual information is more concerned with object properties, rather than 
information about the movement of others. Arbib and Rizzolatti (45) speculate that a 
connection from STs to area 7b and then to F5 provides the input about the movements 
of others. Such statements are partially supported by imaging studies that demon- 
strated the simultaneous activation of the inferior area 6 and STs in humans (46, 47). 
The STs->7b->F5 connection would complete a first hypothesis for a neurophysiologi- 
cal pathway of imitation. 


Imitation from the Viewpoint of Robotics, Artificial Intelligence, 
and Neural Computation 


Symbolic Approaches to Imitation Learning 


At the beginning of the 1980’s, the idea of imitation learning started to find increasing 
attention in the field of manipulator robotics as it seemed to be a promising route to 
automate the tedious manual programming of robots. Inspired by the ideas of artificial 
intelligence, symbolic reasoning was commonly chosen to approach imitation, as out- 
lined in the following (e.g., 48, 49, 50, 51, 52). During a training phase, several example 
movements were generated under manual robot control that achieved a given task 
with a robot. Sensor readings, e.g., position and force, were stored throughout the 
demonstration together with the positions and orientations of obstacles and the goal 
state. For imitation, the example movements were first segmented into subgoals and 
appropriate primitive actions to attain these subgoals. Primitive actions were com- 
monly chosen to be the simple point-to-point movement that industrial robots em- 
ployed at this time. Subgoals could be the robot’s gripper orientation and posi- 
tion—defined in a geometrical relation to the goal—after each primitive action, or they 
were labeled manually by the demonstrator (50). Consequently, the demonstrated task 
was segmented into a sequence of state-action-state transitions. However, given some 
uncertainty in the environment and some variability between several demonstrated 
movements, it was necessary to consolidate all demonstrated movements. For this 
purpose, the state-action-state sequence was converted into symbolic “if-then” rules, 
for instance, expressing the state in terms of ‘aligned’, ‘in contact’, ‘near-to’, and actions 
as ‘move-to’, ‘grasp-object’, ‘move-above’, etc. Appropriate numerical definitions of 
these symbols, e.g., at which distance threshold an object is near, were provided as 
prior knowledge. Such abstraction resulted in graph-based representations, each state 
becoming a graph node and each action a directed link between two nodes. Symbolic 
reasoning could now unify different graphical representations for the same task by 
merging and deleting nodes (49) such that the final program for imitating the demon- 
strated movement was achieved. 


In essence, many recent robotics approaches to imitation learning have remained 
closely related to the example above. New elements include to use visual input of the 
teacher and to perform movement segmentation out of computer vision algorithms (53; 
54, 55). Other projects used data gloves (56) or marker-based observation systems as 
input for imitation learning (56). More recently, research on imitation learning has been 
influenced increasingly by non-symbolic learning tools, for instance artificial neural 
networks, fuzzy logic, etc. (57, 58, 59), thus entering the category of neural computation 
described in the following. 


Inductive Approaches to Imitation Learning 


Figure 3 sketches the major ingredients of an imitation learning system. Most research 
projects have either focused on the perceptual side of imitation by investigating 
movement systems with low complexity (54, 60, 61) (e.g., artificial oculomotor systems, 
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Figure 3: Conceptual sketch of an imitation learning system. The right side of the figure contains 
primarily perceptual elements and indicates how visual information is transformed into spatial 
and object information. The left side focuses on motor elements, illustrating how a set of move- 
ment primitives competes for a demonstrated behavior. Motor commands are generated from in- 
put of the most appropriate primitive. Learning can adjust both movement primitives and the 
motor command generator. The color of arrows matches those in Figure 2 when possible. 


mobile robots, pick-and-place industrial robots), or on the motor end by assuming the 
existence of all necessary perceptual information. In the following, we will primarily 
focus on projects employing the latter strategy. 


After spatial information about the teacher’s movement and object information has 
been extracted, one of the major questions becomes how such information should be 
converted into action. For this purpose, Figure 3 alludes to the concept of movement 
primitives (Glossary), also called ‘movement schemas’, ‘basis behaviors’, ‘units of ac- 
tion’, ‘macro actions’, etc. (36, 62, 63). Movement primitives are sequences of action 
that accomplish a complete goal-directed behavior. A movement primitive can be as 
simple as an elementary action in the symbolic approaches to imitation, e.g., ‘go for- 
ward’, go backward’, etc. However, as discussed in the introduction of this review, 
such low-level representations do not scale well to learning in systems with many de- 
grees-of-freedom. Thus, it is useful for a movement primitive to code complete tempo- 
ral behaviors, like ‘grasping a cup’, ‘walking’, ‘a tennis serve’, etc. This coding results 
in a compact state-action representation where only a few parameters need to be ad- 
justed for a specific goal. For instance, in reaching movements, the target state and 
movement duration are such parameters, or in a rhythmic movement, frequency and 
amplitude need to be specified (64). Using such primitives dramatically reduces the 


number of parameters that need to be learned for a particular movement. The draw- 
back is that the possible movement repertoire becomes more restricted. 


In Figure 3, the perceived action of the teacher is mapped onto a set of existing 
primitives in an assimilation phase, similarly as in Demiris and Hayes (65). Subse- 
quently, the most appropriate primitives are adjusted by learning to improve the per- 
formance in an accommodation phase. Figure 3 indicates such a process by highlight- 
ing the better-matching primitives with increasing line widths. If no existing primitive 
is a good match for the observed behavior, a new primitive must to be generated. This 
concept of movement primitives is closely related to the interpretation of mirror neu- 
rons in the previous section: mirror neurons are thought to code complete motor acts, 
i.e., primitives. However, mirror neurons seem to be a more high-level indicator of 
which movement primitive is appropriate rather than directly the place of motor 
command generation. Movement primitives could also form the ‘supramodal’ repre- 
sentation system of Meltzoff and Moore (24) in which the authors assume that match- 
ing between the perceived and one’s own movement takes place. The authors sug- 
gested that in early infancy, matching is based on goal states for the motor system, 
while, at a higher developmental stage, matching may be based on a temporal se- 
quence of goal states, or a transition between goal states. Goal states are some of the 
natural parameters of a movement primitive (64), and transitions between goal states 
are the primitive’s spatio-temporal signature to which Meltzoff and Moore appeal as a 
candidate to achieve matching between different modes of sensory information. 


Figure 3 also allows distinguishing true imitation vs. response facilitation. For ex- 
plaining response facilitation, every movement primitive has to keep a frequency count 
of its past activations over a restricted temporal window. A demonstration of a familiar 
movement primitive by a teacher will change this frequency distribution, i.e., it will be 
biased towards the observed primitive. If spontaneous action is generated according to 
the probability distribution that is formed by the frequency counts, the demonstrated 
behavior is more likely to occur. The described process can readily be modeled by us- 
ing Bayesian statistics (25). For true imitation, no existing primitive is a good match for 
the demonstrated behavior such that learning is required to either adapt an existing 
primitive or to generate a new one. From this viewpoint, response facilitation and true 
imitation share largely the same circuitry and have to tackle similar computational 
problems. 


Imitation Learning of Novel Behaviors 


Several research projects have focussed on imitation as a method of creating novel be- 
haviors. Three major approaches can be distinguished. 


Learning a Control Policy Directly 

The demonstrated behavior can be used to learn the appropriate control policy (Glos- 
sary) directly by supervised learning. For this purpose, the state x and the action u of 
the teacher need to be observable and identifiable. This prerequisite, shared by all 
forms of imitation learning, imposes a serious constraint since, normally, motor com- 
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mands and the internal variables of the teacher are hidden from the observer. Thus, a 
movement primitive needs to be defined in coordinate frame based on variables that 
can be perceived, e.g., the acceleration of the finger tip in the task of pole balancing in- 
stead of the commands sent to the motor neurons. Aboaf et al. (66) called such an ap- 
proach task-level learning, and in analogy, “task-level imitation” can be used in the 
context of imitation learning (Glossary). Task-level imitation requires prior knowledge 
of how a task-level command (e.g., the desired acceleration of the fingertip) can be 
converted into an actuator-level command. Motor control needs to be modular for this 
purpose, i.e., at least separate processes for movement planning and execution need to 
be assumed (67, 68). 


Direct policy learning was conducted for the task of pole balancing with a com- 
puter-simulated pole (69,70). For this purpose, a neural network was trained on task- 
level data recorded from a human demonstration. Similarly, several mobile robotics 
groups adopted imitation by direct policy learning using a “robot teacher” (60, 71, 72, 
73). For example, the “robot student” followed the “robot teacher’s” movements in a 
specific environment, mimicked its actions, and learned to associate which action to 
choose in which state. Afterwards, the robot student had the same competence as the 
teacher in this environment. Importantly, in all these direct policy learning approaches, 
there is no need for the student to know the goal of the teacher. Imitation learning is 
greatly simplified in this manner. However, the student will not be able to undergo 
self-improvement unless an explicit reward signal, usually generated from an optimi- 
zation criterion, is provided to the student, as in the following approaches. 


Learning From Demonstrated Trajectories 

A second approach to learning novel behaviors is based on building policies out of 
demonstrated trajectories. This idea was explored with an anthropomorphic robot arm 
for dynamic manipulation tasks, for instance, learning a tennis forehand and the game 
of kendama (‘ball-in-the-cup’) (74, 75). At the outset, a human demonstrated the task, 
and his/her movement was recorded with marker-based optical recording equipment. 
This process resulted in data about the movement of the manipulated object in Carte- 
sian coordinates, as well as the movement of the actuator (arm) in terms of joint angle 
coordinates. For imitation learning, a hybrid strategy was chosen. Initially, the robot 
aimed at indiscriminate imitation in task space based on position data of the endeffec- 
tor, while trying to use an arm posture as similar as possible to the demonstrated pos- 
ture of the teacher. Afterwards, based on manually provided knowledge of the task 
goal in form of an optimization criterion, the robot’s performance improved by trial 
and error learning until the task was accomplished. For this purpose, the desired 
endeffector trajectory of the robot was approximated by splines, and the spline nodes, 
called via-points, were adjusted by supervised learning until the task was fulfilled. 
Using this method, the robot learned to manipulate a stochastic, dynamic environment 
within a few trials. 


Model-based Imitation Learning 
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A third approach to learning a novel 
primitive employs model-based 
learning (76, 7). From the demons- 
trated behavior, the dynamics of the 
task is approximated in the form of 
a predictive forward model (cf. 77). 
Given knowledge of the task goal, 
the task-level policy of the move- 
ment primitive can be computed 
with reinforcement learning proce- 
dures based on the learned model. 
ognition of sequences of movement primitives accom- For example, Schaal and Atkeson 
plished? (76, 78) showed how the model- 
e Movement recognition through movement generation: Is the based approach allowed an anthro- 
pomorphic robot arm to learn the 
task of pole-balancing in just a sin- 
gle trial, and the task of a “pendu- 
lum swing-up” in only three to four 
trials. These authors also demon- 
strated that task-level imitation 
based on direct policy learning, 
augmented with subsequent self- 
learning, can be rather fragile and does not necessarily provide significant learning 
speed improvement over pure trial-and-error learning without a demonstration. 


Outstanding Questions 

eLearning perceptual representations: How can appropriate 
representations of the identity and movement of others be 
developed in an automated fashion from visual input? Is it 
necessary that such representations develop simultaneously 
with the motor representations? 

e Movement primitives: Is there a basic set of primitives that 
can initialize imitation learning? How complex are the most 
elementary primitives in this set? How can new primitives 
be learned, and old primitives be combined to form higher 


level movement primitives? How is sequencing and the rec- 


movement generating mechanism directly employed for 
movement recognition? What representation allows such a 
dual use of the motor system? Are movement primitives si- 
multaneously predictive forwards models? 

e Understanding task goals: How can the intention of a demon- 


strated movement be recognized and converted to the imi- 


tator’s goal? 


Implications for Computational Models of Imitation Learning 


The approaches discussed in the previous paragraphs illustrated some computational 
ideas for how novel behaviors can be learned by imitation. Interesting insights into 
these methods can be gained by analyzing the process of how a perceived behavior is 
mapped onto a set of existing primitives. Two major questions (24) become a) what is 
the matching criterion for recognizing a behavior, and b) in which coordinate frame 
does matching take place? 


If only the control policy of the movement primitive exists, finding a matching crite- 
rion becomes difficult. One solution would be to try a primitive, observe its outcome in 
task space, and generate a performance criterion based on the similarity between the 
executed and the teacher’s behavior. This procedure needs to be repeated for every 
primitive in the repertoire and is thus quite inefficient. Another possibility arises if the 
primitive outputs task-level commands that can be compared directly with the 
teacher’s performance. In this case, the movement primitive acts simultaneously as a 
forward model (64), an approach that is described in more detail below. 


The via-point method (74) can easily be adapted for movement recognition. Via- 
points are a parsimonious representation of a movement and may be used for classifi- 
cation as well. For example, a demonstrated movement can be transformed into via- 
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points, and the number and location of via-points can be compared against those of 
existing movement primitives in order to choose the best match, as has been demon- 
strated for handwriting and character recognition (79). Despite the fact that there are 
various open issues with regard to translation, scale, and rotation invariance in the via- 
point approach, the suggested bidirectional interaction between perception and action 
is noteworthy. Movement recognition is directly accomplished with the movement 
generating mechanism. This concept is compatible with what has been observed in 
mirror neurons, and it also ties into other research projects that emphasize the bidirec- 
tional interaction of generative and recognition models (80) in unsupervised learning. 


Lastly, a third alternative should be considered, based on forward models (76, 7), 
but put into a multiple-model competition (68, 81, 65). If every movement primitive 
has a forward model, all primitives can simultaneously attempt to predict the teacher’s 
behavior in the form of a mental simulation loop, indicated by the ‘efference copy’ arc 
in Figure 3. The motor command will be generated by the primitives that make the 
most accurate prediction. As in the via-point approach, movement recognition is based 
on the movement generating mechanism, as, for instance, in Wolpert and Kawato (68) 
and Demiris and Hayes (65). Such an approach is particularly easy if movement primi- 
tives are coded in task space since the prediction of a primitive will be directly compa- 
rable to the teacher’s performance. If the primitive operates in actuator space, addi- 
tional coordinate transformations are needed before a task level comparison can be ac- 
complished (68). Since forward models have various additional advantages for motor 
control (43, 68), movement recognition based on forward models could offer the most 
general and powerful solution to the perception to action mapping. Interestingly, 
movement recognition based on forward models integrates smoothly with the simula- 
tion theory of mind reading (42). It can also provide a computational mechanism for 
Meltzoff and Moore’s (23; 24) ideas of imitation as a basic communication skill and 
their ‘Active Intramodal Matching” model of imitation. 


One final issue concerns the imitation of complex motor acts that involve learning a 
sequence of primitives and when to switch between them. In this context, Fagg and 
Arbib (82) provided a model of reaching and grasping based on the known anatomy of 
the fronto-parietal circuits, including the mirror neuron system. Essentially, their 
model employed a recurrent neural network that sequenced and switched between 
motor schemas based on sensory cues. In a robotic study, Pook and Ballard (58) used 
hidden Markov models to learn appropriate sequencing from demonstrated behavior. 
There is also large body of literature in the field of time series segmentation (83, 84, 85) 
that employed competitive learning and forward models for recognition and sequenc- 
ing in a way that is easily adapted for imitation learning as illustrated in Figure 3. 


Is Imitation Learning the Route to Humanoid Robots? 


In the introduction of this article, we appealed to a pragmatic view of imitation learn- 
ing as a means to speed up learning in complex high dimensional motor systems, such 
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as humanoid robots. This view emerged from the 
lack of theories of motor learning that are able to 
work efficiently in high dimensional spaces. In- 
terestingly, the apparently simple idea of imita- 
tion opened a Pandora’s box of important com- 
putational questions in perceptual motor control. 
None of the approaches described in this article 
could provide satisfying answers to questions of 
appropriate perceptual representations for imi- 
tation, motor representations, and the learning of 
these representations. However, consensus may 
be reached that research into imitation needs to 
include a theory of motor learning, of compact 
state-action representations or movement primi- 
tives, and of the interaction of perception and ac- 
tion. It is likely that these three components can- 
not be studied in isolation: perceptual represen- 
tations serve motor representations, motor repre- 
sentations facilitate perception, and learning 
provides the mutual constraints between them. 
Learning theories based on such reciprocal inter- 
actions are currently under investigation in com- 
Figure 4: Humanoid robot at the putational neuroscience (80, 86). On this view, in- 
Dynamic Brain Project at the ATR stead of being an idiosyncratic research topic, 

Labs in Japan imitation learning could be conceived of as a re- 
search strategy that channels investigations in 
computational motor control towards the im- 
portant topic of action-perception coupling. 


While it seems fair to say that a formal understanding of imitation would certainly 
be a major step towards creating humanoid robots, biomimetic robotic systems, in par- 
ticular humanoid robots, have also become a new tool to investigate cognitive and 
biological questions (87). For instance, the Cog project at the Massachusetts Institute of 
Technology investigates how far a humanoid robot could become “cognitive” via a 
bottom-up approach (88). The Dynamic Brain Project at the ATR Laboratories in Japan 
has been conducting research with an anthropomorphic robot arm for several years 
(74, 76) and is currently working with a novel humanoid robot (Figure 4) to study 
theories of computational neuroscience and imitation learning. Other humanoid robot 
projects include a research group at the University of Tokyo (89), and the humanoid 
robotics project at Waseda University (90). The blending of psychology, neuroscience, 
and engineering in such research projects seems to be a new trend that will be benefi- 
cial for advancing knowledge in both technological and biological sciences. 
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